-
Couldn't load subscription status.
- Fork 321
feat: Optimize and fix filtering on toStartOfX primary key expressions #1265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦋 Changeset detectedLatest commit: 19fd772 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Pull Request ReviewThis PR introduces an optimization for filtering on ✅ Strengths
🔍 Code Quality ObservationsStrong Points:
Minor Suggestions:1. Error handling in The function logs an error but returns const columnArgument = args[0];
if (columnArgument == null) {
console.error(`Failed to parse column argument from ${expr}`);
return undefined; // Explicit return
}2. Documentation in hook (packages/app/src/hooks/useOptimizedTimestampValueExpression.ts:63) The condition check on line 63 seems to exclude Metric sources from optimization. Consider adding a comment explaining why: // Metrics use a different schema structure that doesn't benefit from this optimization
if (source.kind === SourceKind.Metric)
return source.timestampValueExpression;3. Test naming (packages/app/src/hooks/tests/useOptimizedTimestampValueExpression.test.tsx:120) The test description says "ungracefully" but it actually handles the case gracefully by returning it('should return undefined when source is null', () => {4. Regex pattern robustness (packages/common-utils/src/renderChartConfig.ts:487) The regex 🎯 Performance ConsiderationsPositive:
Consider:
🔒 Security ConcernsNo major security issues identified. The code:
📊 Test CoverageExcellent coverage:
One suggestion: 📝 Additional Observations
🎓 Code Style Alignment with CLAUDE.md✅ Follows TypeScript strict typing guidelines 💡 RecommendationLGTM with minor suggestions. This is a well-implemented optimization with excellent test coverage. The minor suggestions above are optional improvements that don't block merging. The PR successfully addresses the performance issue while maintaining code quality and backward compatibility. Great work! 🚀 |
E2E Test Results✅ All tests passed • 26 passed • 3 skipped • 216s
|
2e4ca2e to
1ca6d15
Compare
1ca6d15 to
39bc709
Compare
39bc709 to
36275d6
Compare
Pull Request Review - HyperDX #1265SummaryThis PR implements an optimization for ClickHouse queries with primary keys that use toStartOfX functions. The changes automatically detect and utilize these primary key expressions to improve query performance, and fix filtering issues with time ranges. Strengths1. Excellent Test Coverage
2. Well-Designed API
3. Performance Optimization
4. Backward Compatibility
Issues and Concerns1. Potential SQL Injection Risk (HIGH PRIORITY)Location: packages/common-utils/src/renderChartConfig.ts:560, 566 The code directly interpolates the function name and arguments into SQL. While toStartOf.formattedRemainingArgs comes from parsing timestamp expressions, it is directly injected into SQL without sanitization. Recommendation:
2. Error Handling Could Be More SpecificLocation: packages/common-utils/src/renderChartConfig.ts:521-523 The catch block uses console.log instead of a proper logger and silently swallows all errors. This makes debugging difficult. Recommendation: Use a proper logger with appropriate level and include context about the failure. 3. Regex Pattern Could Be More RobustLocation: packages/common-utils/src/utils.ts:598 The pattern (toStartOf\w+) matches any word characters, which could match invalid function names like toStartOfXYZ123. Recommendation: Use an explicit allowlist of valid ClickHouse toStartOf functions. 4. Type Safety IssueLocation: packages/common-utils/src/renderChartConfig.ts:511-515 No type guard or validation that primary_key exists and is a string before using it. Recommendation: Add proper type checking before destructuring. 5. Magic Strings in LogicLocation: packages/common-utils/src/utils.ts:645-649 Hard-coded function names like toUnixTimestamp and toDateTime make the code fragile and difficult to maintain. Recommendation: Extract to constants or a configuration object. Testing Recommendations
Security ReviewOverall Security Posture: MEDIUM RISK The main security concern is the direct string interpolation of parsed timestamp expressions into SQL queries. While the code uses the chSql template tag for parameterization in most places, the formattedRemainingArgs is directly inserted. Action Items:
ConclusionThis is a well-implemented feature with excellent test coverage that solves a real performance problem. The core logic is sound and the approach is pragmatic. However, there are security concerns around SQL injection that should be addressed before merging. The error handling and input validation could also be improved. Recommendation: Approve with required changes
The test coverage is exemplary and gives high confidence in the correctness of the implementation. |
36275d6 to
e983faa
Compare
e983faa to
711c169
Compare
711c169 to
1e7f9bb
Compare
1e7f9bb to
7fa8ec0
Compare
7fa8ec0 to
779f63c
Compare
| } catch (e) { | ||
| console.log('Failed to optimize timestampValueExpression', e); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clicking through the app, I don't run into this error, and don't expect it to happen, but am adding this try catch here as an extra layer of protection against an error either in getTableMetadata or optimizeTimestampValueExpression, in which case we can always fall back to the raw timestampValueExpression used before
779f63c to
0646139
Compare
Code Review✅ Overall AssessmentNo critical issues found. The PR successfully addresses the optimization and fix for timestamp filtering with 🎯 Strengths
|
0646139 to
f2196b9
Compare
f2196b9 to
c2726c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding testing instructions! That made it clear how to test the changes 🙏
I tried both the Timestamp Column fix and the new table PK schema and both worked as expected!
LGTM!
Closes HDX-2576
Closes HDX-2491
Summary
It is a common optimization to have a primary key like
toStartOfDay(Timestamp), ..., Timestamp. This PR improves the experience when using such a primary key in the following ways:toStartOfDay(Timestamp)andTimestampin this case, instead of justTimestamp. This improves performance by better utilizing the primary index. Previously, this required a manual change to the source's Timestamp Column setting.toStartOfXfunction to the right-hand-side of timestamp comparisons. So when filtering using an expression liketoStartOfDay(Timestamp), the generated SQL will have the conditiontoStartOfDay(Timestamp) >= toStartOfDay(<selected start time>) AND toStartOfDay(Timestamp) <= toStartOfDay(<selected end time>). This resolves an issue where some data would be incorrectly filtered out when filtering on such timestamp expressions (such as time ranges less than 1 minute).With this change, teams should no longer need to have multiple columns in their source timestamp column configuration. However, if they do, they will now have correct filtering.
Testing
Testing the fix
The part of this PR that fixes time filtering can be tested with the default logs table schema. Simply set the Timestamp Column source setting to
TimestampTime, toStartOfMinute(TimestampTime). Then, in the logs search, filter for a timespan < 1 minute.Without the fix, you should see no logs, since they're incorrectly filtered out by the toStartOfMinute(TimestampTime) filter
Screen.Recording.2025-10-27.at.11.08.24.AM.mov
With the fix, you should see logs in the selected time range
Screen.Recording.2025-10-27.at.11.07.40.AM.mov
Testing the optimization
The optimization part of this change is that when a table has a primary key like
toStartOfMinute(TimestampTime), ..., TimestampTimeand the Timestamp Column for the source is justTimestamp, the query will automatically filter by bothtoStartOfMinute(TimestampTime)andTimestampTime.To test this, you'll need to create a table with such a primary key, then create a source based on that table. Optionally, you could copy data from the default
otel_logstable into the new table (INSERT INTO default.otel_logs_toStartOfMinute_Key SELECT * FROM default.otel_logs).DDL for log table with optimized key
Once you have that source, you can inspect the queries generated for that source. Whenever a date range filter is selected, the query should have a
WHEREpredicate that filters on bothTimestampTimeandtoStartOfMinute(TimestampTime), despitetoStartOfMinute(TimestampTime)not being included in the Timestamp Column of the source's configuration.